-
Notifications
You must be signed in to change notification settings - Fork 22
⚡️ Speed up method CommentMapper.visit_AsyncFunctionDef by 11% in PR #678 (standalone-fto-async)
#767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
codeflash-ai
wants to merge
100
commits into
standalone-fto-async
from
codeflash/optimize-pr678-2025-09-26T19.52.34
Closed
⚡️ Speed up method CommentMapper.visit_AsyncFunctionDef by 11% in PR #678 (standalone-fto-async)
#767
codeflash-ai
wants to merge
100
commits into
standalone-fto-async
from
codeflash/optimize-pr678-2025-09-26T19.52.34
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[LSP] Ensure optimizer cleanup on server shutdown or when the client suddenly disconnects
…licate-global-assignments-when-reverting-helpers
…/duplicate-global-assignments-when-reverting-helpers`) The optimized code achieves a **17% speedup** by eliminating redundant CST parsing operations, which are the most expensive parts of the function according to the line profiler. **Key optimizations:** 1. **Eliminate duplicate parsing**: The original code parsed `src_module_code` and `dst_module_code` multiple times. The optimized version introduces `_extract_global_statements_once()` that parses each module only once and reuses the parsed CST objects throughout the function. 2. **Reuse parsed modules**: Instead of re-parsing `dst_module_code` after modifications, the optimized version conditionally reuses the already-parsed `dst_module` when no global statements need insertion, avoiding unnecessary `cst.parse_module()` calls. 3. **Early termination**: Added an early return when `new_collector.assignments` is empty, avoiding the expensive `GlobalAssignmentTransformer` creation and visitation when there's nothing to transform. 4. **Minor optimization in uniqueness check**: Added a fast-path identity check (`stmt is existing_stmt`) before the expensive `deep_equals()` comparison, though this has minimal impact. **Performance impact by test case type:** - **Empty/minimal cases**: Show the highest gains (59-88% faster) due to early termination optimizations - **Standard cases**: Achieve consistent 20-30% improvements from reduced parsing - **Large-scale tests**: Benefit significantly (18-23% faster) as parsing overhead scales with code size The optimization is most effective for workloads with moderate to large code files where CST parsing dominates the runtime, as evidenced by the original profiler showing 70%+ of time spent in `cst.parse_module()` and `module.visit()` operations.
Signed-off-by: Saurabh Misra <[email protected]>
…25-08-25T18.50.33 ⚡️ Speed up function `add_global_assignments` by 18% in PR #683 (`fix/duplicate-global-assignments-when-reverting-helpers`)
…cs-in-diff [Lsp] return diff functions grouped by file
* lsp: get new/modified functions inside a git commit * better name * refactor * revert
* save optimization patches metadata * typo * lsp: get previous optimizations * fix patch name in non-lsp mode * ⚡️ Speed up function `get_patches_metadata` by 45% in PR #690 (`worktree/persist-optimization-patches`) The optimized code achieves a **44% speedup** through two key optimizations: **1. Added `@lru_cache(maxsize=1)` to `get_patches_dir_for_project()`** - This caches the Path object construction, avoiding repeated calls to `get_git_project_id()` and `Path()` creation - The line profiler shows this function's total time dropped from 5.32ms to being completely eliminated from the hot path in `get_patches_metadata()` - Since `get_git_project_id()` was already cached but still being called repeatedly, this second-level caching eliminates that redundancy **2. Replaced `read_text()` + `json.loads()` with `open()` + `json.load()`** - Using `json.load()` with a file handle is more efficient than reading the entire file into memory first with `read_text()` then parsing it - This avoids the intermediate string creation and is particularly beneficial for larger JSON files - Added explicit UTF-8 encoding for consistency **Performance Impact by Test Type:** - **Basic cases** (small/missing files): 45-65% faster - benefits primarily from the caching optimization - **Edge cases** (malformed JSON): 38-47% faster - still benefits from both optimizations - **Large scale cases** (1000+ patches, large files): 39-52% faster - the file I/O optimization becomes more significant with larger JSON files The caching optimization provides the most consistent gains across all scenarios since it eliminates repeated expensive operations, while the file I/O optimization scales with file size. * fix: patch path * codeflash suggestions * split the worktree utils in a separate file --------- Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
Deque Comparator
* LSP reduce no of candidates * config revert * pass reference values to aiservices * line profiling loading msg --------- Co-authored-by: saga4 <[email protected]> Co-authored-by: ali <[email protected]>
* LSP reduce no of candidates * config revert * pass reference values to aiservices * fix inline condition --------- Co-authored-by: saga4 <[email protected]>
import variable correctly
Signed-off-by: Saurabh Misra <[email protected]>
support attrs comparison
apscheduler tries to schedule jobs when the interpreter is shutting down which can cause it to crash and leave us in a bad state
patch apscheduler when tracing
The optimized version eliminates recursive function calls by replacing the recursive `_find` helper with an iterative approach. This provides significant performance benefits: **Key Optimizations:** 1. **Removed Recursion Overhead**: The original code used a recursive helper function `_find` that created new stack frames for each parent traversal. The optimized version uses a simple iterative loop that traverses parents sequentially without function call overhead. 2. **Eliminated Function Creation**: The original code defined the `_find` function on every call to `find_target_node`. The optimized version removes this repeated function definition entirely. 3. **Early Exit with for-else**: The optimized code uses Python's `for-else` construct to immediately return `None` when a parent class isn't found, avoiding unnecessary continued searching. 4. **Reduced Attribute Access**: By caching `function_to_optimize.function_name` in a local variable `target_name` and reusing `body` variables, the code reduces repeated attribute lookups. **Performance Impact by Test Case:** - **Simple cases** (top-level functions, basic class methods): 23-62% faster due to eliminated recursion overhead - **Nested class scenarios**: 45-84% faster, with deeper nesting showing greater improvements as recursion elimination has more impact - **Large-scale tests**: 12-22% faster, showing consistent benefits even with many nodes to traverse - **Edge cases** (empty modules, non-existent classes): 52-76% faster due to more efficient early termination The optimization is particularly effective for deeply nested class hierarchies where the original recursive approach created multiple stack frames, while the iterative version maintains constant memory usage regardless of nesting depth.
…25-09-25T14.28.58 ⚡️ Speed up function `find_target_node` by 18% in PR #763 (`fix/correctly-find-funtion-node-when-reverting-helpers`)
…node-when-reverting-helpers [FIX] Respect parent classes in revert helpers
Granular async instrumentation
…d move other merged test below; finish resolving aiservice/config/explanation/function_optimizer; regenerate uv.lock
The optimized code achieves a 10% speedup through several key micro-optimizations that reduce overhead in the performance-critical loops:
**Key Optimizations:**
1. **Hoisted repeated attribute lookups**: Variables like `node_body = node.body`, `original_runtimes = self.original_runtimes`, and `results = self.results` are cached once outside the loops instead of being accessed repeatedly via `self.` attribute lookups.
2. **Cached type objects and method references**: `isinstance_stmt = ast.stmt`, `isinstance_control = (ast.With, ast.For, ast.While, ast.If)`, and `get_comment = self.get_comment` eliminate repeated global/attribute lookups in the hot loops.
3. **Improved string formatting**: Replaced string concatenation (`str(i) + "_" + str(j)`) with f-string formatting (`f"{i}_{j}"`) which is more efficient in Python.
4. **Optimized getattr usage**: Changed `getattr(compound_line_node, "body", [])` to `getattr(compound_line_node, "body", None)` with a conditional check, avoiding list creation when no body exists.
**Why it's faster**: The profiler shows the main performance bottleneck is in the nested loops processing control flow statements. By eliminating repetitive attribute lookups and method calls that happen thousands of times (2,729 iterations in the outer loop, 708 in nested loops), the optimization reduces per-iteration overhead.
**Test case performance**: The optimizations show the biggest gains on large-scale test cases with many statements (9-22% faster) and mixed control blocks, while having minimal impact on simple cases with few statements (often slightly slower due to setup overhead).
40c4108 to
7bbb1e7
Compare
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
⚡️ This pull request contains optimizations for PR #678
If you approve this dependent PR, these changes will be merged into the original PR branch
standalone-fto-async.📄 11% (0.11x) speedup for
CommentMapper.visit_AsyncFunctionDefincodeflash/code_utils/edit_generated_tests.py⏱️ Runtime :
5.28 milliseconds→4.76 milliseconds(best of117runs)📝 Explanation and details
The optimized code achieves a 10% speedup through several key micro-optimizations that reduce overhead in the performance-critical loops:
Key Optimizations:
Hoisted repeated attribute lookups: Variables like
node_body = node.body,original_runtimes = self.original_runtimes, andresults = self.resultsare cached once outside the loops instead of being accessed repeatedly viaself.attribute lookups.Cached type objects and method references:
isinstance_stmt = ast.stmt,isinstance_control = (ast.With, ast.For, ast.While, ast.If), andget_comment = self.get_commenteliminate repeated global/attribute lookups in the hot loops.Improved string formatting: Replaced string concatenation (
str(i) + "_" + str(j)) with f-string formatting (f"{i}_{j}") which is more efficient in Python.Optimized getattr usage: Changed
getattr(compound_line_node, "body", [])togetattr(compound_line_node, "body", None)with a conditional check, avoiding list creation when no body exists.Why it's faster: The profiler shows the main performance bottleneck is in the nested loops processing control flow statements. By eliminating repetitive attribute lookups and method calls that happen thousands of times (2,729 iterations in the outer loop, 708 in nested loops), the optimization reduces per-iteration overhead.
Test case performance: The optimizations show the biggest gains on large-scale test cases with many statements (9-22% faster) and mixed control blocks, while having minimal impact on simple cases with few statements (often slightly slower due to setup overhead).
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-pr678-2025-09-26T19.52.34and push.